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Absti act 


Ihe chssdication of distuibinces of power s> stems is the irapoitant task in 
automated power quality assessment system This thesis woik is mainly concentrated 
on the design of a classifier for disturbances in power systems It uses characteristic 
features of disturbances to design and evaluation of the classification system The 
simulation of the classiiicr is done with aitificiilly gcnciatcd data of disluibantcs 
using known nnges ol vanous disUiibancts luiUiics Vinous cl issilic iliuii 
techniques like probabilistic lu/zy nciiril nctwoik ind gcomcliic iic tested to 
design the suitable classifier for power quality disturbances classification Ihc 
suggested elassiliei uses paiallel elassilieation sluieluie ol tliiee selected el issilieis 
Ihc requiiemenls ol the classifier include assessment ol the type of distiiibanee 
quality of classification and adaptability to new unknown distmbance The 
sequential classification approach is also implemented foi supci imposed 
distui bailees classification 
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1 Introduction 


PQ has evei been a topic of considerations in power systems but increasing at 
tention is been given to PQ since recent years The interest in quality of power 
involves all three parties concerned with power business utility companies, equip 
ment maiiufactuieis and electric power consumers Many reasons aie responsible 
for the growing concern with PQ 

• the end user equipment has become very sensitive to PQ as a result of the 
wide range of microprocessor based applications, 

• complexity and interconnection of industrial processes a restart up of pro- 
cesses IS costly , 

• development and application of sophisticated powei electronics these de 
vices are both source and victim of PQ disturbances, 

• deiegulation of the power market PQ as a product feature 

• business competition causes rationalisation reliability of electric power 
and as well PQ decreases 

These are convincing reasons to monitor and assess the PQ in power systems 
So fai the PQ diagnosis is very time consuming, because of the large amount of 
recorded data and its manual analysis So an automatic technique of PQ analysis 
IS desirable 

The project Develop meirt of a System of Automated Classihcation and Assess 
ment of PQ Distuibances” pursues the approach to identify the disturbances by 
their characteristic features With information about type, location and statisti 
cal parameters (frequency of occurrence, etc ) the analysis of the PQ distuibances 
IS to be realised For this purpose methods of pattern classification, an application 
of artificial intelligence, are used The goal of the present thesis is the implemen 
tation of pattern classification techniques to the PQ problem The choice of a 
suitable classifier represents the mam focus In the following chapter the prin- 
ciples of pattern classification and of the development of a pattern classification 
system are introduced In Chapter 3 the application of pattern classification to 
PQ is described Afterwards the studied classifiers are presented The methods 
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for classifier test and the test results are given m chapter 5 The selected classifier 
system and its realisation in Matlab is shown in chapter 6 Finally the results of 
this thesis and new proposals aie summarised 



2 Classification Basics 


The goal of pattern lecognition systems is the classification of objects (patterns) 
into a number of categoiies or classes Pattern recognition has a lot of practi 
cal applications, e g probably the most popular one is character recognition 
Plere, the patterns are described by pixels of digitized character images which are 
mapped to classes the letters of the alphabet 

2 1 Principles of Pattern Classification 

In the preceding example pixels of images are used to represent patterns Such 
measurable qualities of patterns are called features In general case n features 
Xt, 1 = 1,2, , n are used and form the feature vector 

X = [jBl, X2) ) (2 1) 

Ciaphically it can be shown like in Figure 2 i Each of the feature vector 


^2 




^3 



Figure 2 i Feature vector iepresentation[2] 


describes uniquely a single pattern (object) The patterns are assigned to a finite 
set of classes 

(2 2 ) 


n = {wi, wj, , a>k} 
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Where k is the number of classes 

In mathematical sense the mapping of classification from feature space to decision 
space IS stated as follows 

5 (2 3) 

To illustrate the classification task the featuie space spanned by two dimensional 
feituie vectois is shown in Fig 2 2 

The circles and crosses lepresent feature vectors of sample patteins of two diffei 



ent categoiies The straight line is known as the decision line which constitutes 
the classifier It’s role is to divide the feature space into regions which belongs 
to either class Wi or class W2 If the feature vector of an unknown pattern falls 
into the region of Wi, it is classified as class uii But this doesn’t implicate that 
the decision is correct If not, a misclassification has occurred 
How the whole pattern classification system is embedded in a real world scheme 
IS depicted in Fig 2 3 

Generally a pattern classification system consists of the following three steps 

I) Feature Extraction Some features that can express the patterns well are ex 

tracted from the patterns m the real world The features are usually ex 
pressed as numerical values The patteins will be classified m this feature 
space 

II) Construction of a Classifier and Testing A classifier is constructed on the 

basis of training samples in the feature space In this step, the number of 
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Pattern Classification System 

Figure 2 3 Generalised structure of pattern classification system 


the training samples are usually limited In testing stage of the classifiei 
the performance of the classifier can be judged on the basis of total number 
of misclassified samples out of total known test samples of a class Usually, 
test samples are other than training samples 

III) Classification of Unknown Samples Unknown samples are classified using 
the classifier When the classifier does not have a sufficient classification 
power, many unknown samples are misclassified' 


2 2 Design of Pattern Classification Systems 

The design of a pattern classification system for a given classifier task can be 
divided into the following stages 

1 Determination of patterns and classes 

2 Feature generation 

3 Feature selection 

4 Classifier design 

5 System evaluation 

Figure 2 4 shows the various stages As it can seen from the feedback rows, 
these stages are not independent The stages are interrelated and to improve the 
overall performance of the system, one may go back to redesign earlier stages 
It is obvious that this classifier system design is a difficult optimisation process 
To facilitate an optimal design there exist some fundamental rules for each stage, 
which are pointed out as follows 
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Figure 2 4 The basic stages involved in the design of a classification system 

STAGE I Determinatron of patterns and classes 

At the first stage the designer has to analyse the process, on which a classifier 
system is to be applied He has to ask the questions what are the possible events 
of the process and how many events are there'^' It is important to notice, that 
at this early stage the structure of the classifiei is set Foi a simple structure of 
an individual classifier the classes should represent unique events of the process 
This means, that the classes are exclusive In contrary when classes coincide or 
when classes and sub classes come into question as result, a hierarchical classifier 
stiuctuie is lecommended It is often helpful to define a class, that lepiesents 
the normal condition of the process This relieves the performance tuning of the 
classifier 

STAGE II Feature Generation 

Here, the designer asks for the typical patterns of the process, and which could 
be mapped to the classes It is essential to use significant patterns of the process 
From this patterns, those features are to be extracted, which are measurable and 
can be represented numerically (as number or boolean) These features combined 
to a feature vector form a sample vector, which could be mapped to a certain 
class Every feature has a range of measurement called feature range which has 
subranges for diTerent disturbances Every feature range differs from another 
feature range by it’s units So, it is required to normalise all feature ranges on 
the same scale, eg 0 iOO % or 0 1 pu, before classification For an excellent 
performance of a classifier system a high number of sample vectors are requested 
But there are cases, where the process only provides a small rate of samples or 
the classifier is to be designed on the fly In that case modeling the process and 
generation of artificial feature samples is suggested This simulated data offers 
some advantages 

• cheaper than real data, 

• easy and fast to generate. 
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• more flexible, in respect to feature ranges and distributions 

STAGE 111 Feature Selection 

Often the process provides a lot of features, that describe a certain class Out of 
them those features are to select, which offer the most significant characteristics 
and allow optimal class separability Considering the feature space the requiie 
ment foi optimal class separability is pictured by Fig 2 5 

Fig(c) shows the best (b) the worst and (a) a moderate separability Another 



Figuie 2 5 Classes with (a) small within class variance and small between class 
distances, (b) large within class variance and small between class dis 
tances and (c) small within class variance and large between-class 
distances 


consequence is, that a high number of features means the dimension of the re 
suiting feature vector is high For fast classification a low complexity of the 
feature space is advantageous This implicates, that the features is to be anal 
ysed for redundant information Generally linear dependencies of features should 
be avoided The example of three features illustrates this height, length, surface 
Here either surface or the two other features are redundant This example offers 
anothei problem which one is more redundant than the othcis'^ Ileie the pim 
cipal component analysis (PCA) is a useful tool to identify linear dependencies 
of different features (more about that m chapter 3) 

STAGE IV Classifier Design 

The classifier itself is the core of the pattern classification system and embodies 
the artificial intelligence of the whole system The classifier realises the link 
between the feature vectors and the classes It decides to which event an occurring 
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pattern is assigned to The decision bases on the comparison of unknown with 
known data As a consequence the classifier is to make known with possible 
data earlier In terms of pattern recognition the classifier has to be trained with 
a prion information Here it has to be distinguished between supervised and 
unsupervised learning In the first case of supervised learning for each sample 
vectoi the right appiopriate class is known In case of unsupervised learning 
the infoimation of class membership is missing Here the goal is to unravel the 
underlying similarities of the sample vectors and cluster (group) similar vectors 
together In this work we only deal with supervised pattern recognition so the 
inteiested reader is referred to additional literature[2] Basically the classifiers 
can be divided into three types according to their mathematical background 

• linear classifiers (e g Euclidean distance classifier) 

• nonlinear classifiers (e g ANN Fuzzy and polynomial classifier) 

• probabilistic classifieis (Bayes classifier) 

It IS hardly possible to decide which type of classifier is suited best for a given 
application (it is the mam focus of this work to find the best classifier for the 
PQ problem) The decision depends on a lot of factors, which are regarded in 
chapter 4 

STAGE V System Evaluation 

When a certain classifier is chosen it has to be trained The number of the 
training sets (set = sample vector + class information) has a large influence on 
the performance of the classifier Here the question is how many training sets are 
necessary The answer is simple, but nevertheless unprecise the more, the better 
To test the classifier it has to be confronted with unknown data ( Generalisation) 
Then the outputs of the classifier are compared to the real class membership of 
the test sets In case of poor results when a large percentage of the outputs 
and a pi ion information mismatches improvements have to be made Figure 2 4 
states, that the optimisation process is possible at any stage So any change m a 
stage should be followed by a system evaluation until the results are satisfactory 




3 Adaptation to PQ problems 


We have to know the various characteristics of different events m power system 
and their interrelation before determination of classes For the events occurring 
simultaneously (mixed or dependent events), it is required to design the classifier 
for more than one stage classification Determination of relation between power 
system events and various features is one of the important tasks for their selection 
Information about range of feature values is required before generation of feature 
vectors of different classes 


3 1 Structure of PQ-Classification System 



Figure 3 1 Generalised structure of PQ Classification system 


PQ assessment is a sequence of five steps as shown in Figure 31 Data acqui 
sition IS an interface between power line measurement system and PQ analysis 
platform Event trigger recognises when there is a new event in the line and 
provides the event waveform pattern to next stage In feature extraction stage 
sufficient features from the disturbance pattern are extracted Classifier detects 
the disturbance type from feature vector PQ of the event can be judged with 
added information of duration and magnitude of the event and rate of occurrence 
along with disturbance type The main concentiation m this thesis work is the 
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classification of PQ disturbances In the following sections we discuss some steps 
about the system design for the PQ classification problem 

3 2 Determination of Patterns and Classes 

Analysis of PQ problem provides events called disturbances Every disturbance s 
information is available m terms of a specific waveform, e g voltage waveform 
over a time range, called pattern This pattern could be transformed into in 
formation of diff'^rent signals, e g peak value RMS (Root Mean Square) value 
duration rise time TIID (Total Harmonic Distortion) value Dominant Intel 
harmonics (DIH) etc create a vector of these features called a feature vector 
Different disturbances differ by their special characteristics of various features 
Disturbances which diffei by various ranges of the same discriminating features 
can be considered for classifier design e g impulsive transient, oscillatoiy tran 
sient, voltage sag, voltage swell, over voltage, under voltage need information 
about Vm, VuMSi t , Te and DIH As mentioned m STAGE I of Chapter 
2 2 every disturbance occurring exclusively defines one class Feature vectors 
(patterns) of the same kind of disturbance is assigned to the same class Feature 
vectois having mixed characteristics of two independent disturbances can be clas 
sified by the same classifier structure without considering then special classes by 
providing some more features to the classifier In that case, one class is necessary 
which defines the normal event meaning there is no disturbance for reference 
More about this will be discussed in Chapter 4 Selection of classes also limited 
due to information available after feature extraction stage of classification system 
e g voltage unbalance needs information about the voltages of all three phases 
at the same time If the patterns don’t provide those information then it is not 
possible to classify that disturbances 

The major types of non stationary power system disturbances which persist foi 
a certain duration only, related to PQ diagnostics are 

• Impulsive transients 

• Repetitive impulsive transients 

• Oscillatory transients 

• Voltage swells (surges) 

• Voltage sags (dips) 

• Overvoltages 

• Undervoltages 

• Short interruptions 
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• Sustained mteiiuptions 

• Voltage fluctuation (flicker) 

• Voltage imbalance 

• Power fiequency variations 

• DC offset (distortion) 

• Harmonics (Jistoition) 

• Notching (distortion) 

• Interharmomcs (distortion) 

• Noise (distortion) 

3 3 Features of PQ- Events 

The PQ events are characterised by the analysis of the disturbance signals In 

the following, possible features are listed 

Peak voltage Vm It is a peak(maximum) voltage measured within observation 
window It would be a peak value of impulse if there is otherwise it is equal 
to peak value of sinusoidal voltage waveform under normal operation 

Phase angle placement of the event on the sine wave It is a angle between 
positive zero crossing of fundamental component and start of disturbance 
It IS particularly important for tiansients and notching 

Voltage magnitude Vrms It is a RMS value of the voltage measured 

rise time As per standard definition of Rise Time of a waveform or impulse, 
it IS a time from 10% to 90% of the front side of waveform or impulse One 
can replace this feature by Rate of Rise 

Decay time td As per standard definition it is a time fiom 100% to 50% of the 
tail side of the waveform Rate of Decay is also one alternative for Decay 
Time 

Duration of event Te It is time period during which the disturbance is found 
in the waveform It varies from milliseconds to steady state, depending on 
the type of disturbance 

Frequency spectrum It is an analysis of the waveform for the laige range of 
frequencies 
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Dominant interharmonic The frequency of that mterharmonic with the high 
est RMS voltage besides the fundamental voltage 

Notch depth 

Frequency of occurrence It can be measured on the basis of numbei of events 
the same disturbance occurred in one 

Notch area 


Percentage odd/even harmonics 

Total harmonic distortion (THD) The definition of TIID is given as follows 


Vthd = 


v^ 


wheie Vthd RMS voltage due to of all harmonic components V; is a RMS 
value of harmonic voltage and Vi is RMS voltage of fundamental 


3 4 Feature Selection for PQ-problem 

For featuie selection for PQ problem typical ranges of power system disturbances 
for some features are calculated on the basis of Table 3 i[3] Most known powei 
system disturbances are categorised on the basis of then characteristics The 
basic characteristics are spectral content, duration of the phenomena and voltage 
magnitude (RMS) On the basis of these characteristics we could have some 
common features in which they differ by range in which they occur 


3 4 1 Determination of distinct features 

On the basis of typical characteristics known of disturbances, various features 
can be considered for classification of disturbances Selection of features depends 
on the disturbances to classify It is required to know the most discriminative 
feature for a particular disturbance Using reasoning and the knowledge of fea 
ture ranges from Table 3 1, the dependency table is developed as Table 3 2, m 
which the relation bond of the disturbance with different features is categorised 
in three relations It shows that the features must be selected with high rela 
tion for a particular feature Features having medium should be considered and 
and features with low dependency need not to be considered because they don t 
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Categories 

Spectral 

Te 


Vrms 

Transients 

Impulsive 





Nanosecond 

Microsecond 

Millisecond 

5 ns rise 

1 fxs rise 

0 1 ms rise 

<50 ns 

50 ns 1 ms 
>lms 



Oscillatory 





Low Freq 

< 5 kHz 

0 3 50 ms 

0 

4 pel unit 

Medium Freq 

5 500 kHz 

20 fis 

0 

8 per unit 

High Freq 

0 5 5 MHz 

5 fis 

0 

4 per unit 

Short Duration Var 
Instantaneous 





Sag 


0 5 30 cycles 

0 1 

0 9 pel unit 

Swell 


0 5 30 cycles 

1 1 

1 8 per unit 

Momentary 





Interruption 


0 5 cycles 3 sec 

<01 per unit 

Sag 


30 cycles 3 sec 

0 1 

0 9 per unit 

Swell 


30 cycles 3 sec 

1 1 

1 4 per unit 

Temporary 





Interruption 


3 sec 1 mm 

<01 per unit 

Sag 


3 sec 1 mm 

0 1 

0 9 per unit 

Swell 


3 sec 1 mm 

1 1 

1 2 per unit 

Long Duiation Var 





Interruption 


> 1 minute 

0 0 pel unit 

Undervoltage 


> 1 minute 

08 

0 9 per unit 

Overvoltage 


> 1 minute 

i 1 

i 2 per unit 

Waveform Distortion 





Harmonics 

0 100th H 

steady state 


0 20% 

Inter harmonics 

0 6 kHz 

steady state 


0 2% 

Noise 

broad band 

steady state 


0 1% 

Voltage Fluctuations 
Power Freq Var 

< 25 Hz 

intermittent 
< 10 s 


0 1 7% 


Table 3 1 Typical Characteristics of Power Disturbances [3] 


piovide any useful information about that particular disturbance In Table 3 3, 
some disturbances are shown with their typical feature ranges Here, features 
should be selected such that the disturbances can be distinguished by the value 
of features only Feature having same values for all disturbances should not be 
considered for classification For any disturbance, at least one feature must be 
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Note H High relation or dependency, M Medium lelation L Low relation 
Table 3 2 Feature dependency of disturbances 
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there by which the disturbance differs from all other disturbances If new distur 
bances to classify is added then the new feature may be included It depends on 
the new disturbance characteristics More number of featuies required foi more 
numbei of disturbances to classify 



v„* 

(pu) 

Vrms 

(pu) 

T 

ms 

t 

ms 

THD 

% 

DIE 
e {0 1} 

.AL. 

A/m 

% 

Impulsive 

Transient 

2 

5 

1 1 5 

05 5 

0 1 

2 

0 

1 

0 

0 

10 

Oscillatory 

Transient 

2 

5 

1 4 

0 5 10 

0 1 

5 

0 

25 

1 

0 

10 

V Sag 

0 

09 

0 09 

5 100 

0 1 

20 

0 

1 

0 

0 

10 

V Swell 

1 1 

1 4 

11 14 

5 100 

0 1 

10 

0 

1 

0 

0 

10 

Harmonics 

0 9 

1 1 

09 11 

5 100 

0 1 

10 

1 

5 

0 

0 

10 

Intel 

harmonics 

09 

1 1 

09 11 

5 100 

0 1 

10 

0 

1 

1 

0 

10 

Frequency 

Variation 

0 9 

1 1 

09 11 

6 100 

0 1 

10 

0 

1 

0 

10 

100 

No 

disturbance 

09 

1 1 

09 11 

5 100 

0 1 

10 

0 

1 

0 

0 

10 


Note 

Vrn 

Peak voltage 

Vrms 

Voltage magnitude 

T 

Duration of Event 

t 

Rise time 

THD 

Total Harmonic Distortion 

DIH 

Dominant Interharmonics 

A/m 

Change in frequency 

A/m 

\f„ - /„| = 1 Hz /„ = 50 Hz 


Table 3 3 Feature ranges of disturbances 


3 4 2 Normalisation of feature intervals 

The feature intervals normally differ with disturbance It is assessed from real 
world’s disturbances observations and definitions Every disturbance feature has 
distinct interval The representation of the features could be done in percent 
( 0 100%) or m per unit (0 1 pu ) Some features are with their original 

range and their representation after normalisation to 100 % are shown in Table 
3 4 Mathematically, normalisation of the feature range of peak voltage feature 
of Impulsive transient is stated as follows 

^orvgxnal 

^normahaed — _ lUU/o 

opu 

For example, there are some practical limitations for rise time such that it can be 
measured upto minimum value of 0 1 ms only since sampling rate is 20 kHz So 


(3 i) 
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we have limitations to some disturbances to classify Here we concentrate only on 
transients and short duration specially instantaneous variations Normally it is 
represented by linear scaling in a standard unit The units of different features are 
not same Some are per unit values, some are percentage and some are other In 
feature space it is important to have symmetrical size of features in all dimensions 
because we provide the same weights to all features Really the importance of 
the feature range representation varies with type of classifier The scaling could 
be done on linear basis or logarithmic basis when features are represented in 
feature space The logarithmic scaling might be important for some features like 
use time and duration m which the values varies from fraction of milliseconds 
to steady state Some uniformity of the size of feature vector group of every 
disturbance in feature space is required for better classifier performance 


Feature 

Total Feature Range 

Representation 

Peak Voltage 

Voltage (RMS) 

Duration 

Rise time 

Freq spectrum (THD) 
Dominant Interharmonics 
System Frequency Variation 

0 5 per unit 

0 5 per unit 

0 100 ms 

0 10 ms 

0 5% 

0 10 kHz 

49 51 Hz 

0 100% 

0 100% 

0 100% 

0 100% 

0 100% 

0 or 100% 

0 100% 


Table 3 4 Features ranges and representation 


3 4 3 Analysis of feature space 

Power System disturbances can be represented by a vector man dimensional 
space, where n is number of features Every disturbance vector differs from an 
othei disturbance vector by its location in the feature space Group of disturbance 
vectors of same type creates a region (cluster) in the feature space wheie those 
data points he Every region of same type of disturbance vectors differs from 
another by its size and location The shape of the region depends on the data 
distribution Normally, the regions of different kind of disturbances don’t overlap 
In figures 3 2 and 3 3, the representation of data points of different disturbances is 
shown in two dimensional and three dimensional feature space respectively The 
SIX disturbances considered are indicated in above figures The 6 features are 
selected with high relation using information from Table 3 2 are peak voltage 
RMS voltage, rise time duration, THD and Dominant Interharmonics 
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Figure 3 2 2 Dimensional representation of data points 


3 4 4 Reduction of dimensionality 

Principal Component Analysis (PCA) is used here to remove the redundant fea- 
tures from the dataset before training the classifier It is good to use data with 
less features because it becomes easy for the classifier to train, simplifies the prob 
lem and requires less memory PCA reduces the dimensionality of the data PCA 
also provides the eigenvalues in all principal components and on the basis of that 
one can decide how many minimum features are required to describe all most 
all information about the original dataset If some eigenvalues are very small 
compared to other eigenvalues then one can consider those number of features as 
redundant which is equal to number of eigenvalues are very small compared One 
has to find the features which are redundant by training and testing the classifier 
with reduction of features adapting trial and error method because which features 
aie redundant can’t be identified using PCA 

The example shown in figure 3 4 explains reduction of redundant dimension tech 
nique One can judge how many mimmum dimensions are necessary to separate 
the classes data points from one another 
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Figure 3 3 3 Dimensional lepiesenlation of data points 


Expianat(on of PCA 

Tins section describes unsupervised Hebbian learmng in a simple network set 
ting to extract the m principal directions of a given set of data i e the leading 
eigem ectoi directions of the input vectors autocorrelation matrix 
Principal component analysis (PCA) is equivalent to maximising the information 
content in the output of a network of linear units The arm of PCA is to extract 
m normalized orthogonal vectors u , * = 1, 2, , m, m the input space 

that account for as much of the data s vauance as possible Subsequently, the n 
dimensional input data (vectors x) maj be transformed to a lowei m dimensional 
space without losing essential intrinsic mfoiniation This can be done by project 
mg the input r ectors onto the m dimensional subspace spanned bj the extracted 
oithogonaJ rectors u according to the miici products x'^u Since m is smaller 
than n, a dimensionality reduction of the data is achieved This, m turn, mak.es 
subsequent processing of the data (e g clustering or classification) much easier 
to handle 

The following is an outline for a duecl optimisation based method for deter 
mining the u, vectors Let x £ R” be an input vector generated according to a 
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Figure 3 4 PCA applied to two dimensional two class problem In this example 
one eigenvector separates the two classes well and another eigenvector 
doesn t So the only one eigenvector components which provide high 
separability of the classes are enough for classification problem For 
reference of this figure see [14] 


zero mean probability distribution p(x) Let u denote a vector m R"" onto which 
the input vectors are to be projected The projection x^u is the linear sum of n 
zero mean, random variables which is itself a zero-mean random variable Here 
the objective is to find the solution(s) u* that maximises ((x'^u)^) the variance 
of the projection x'^u with respect to p(x), subject to ||u|l = 1 In other words 
we are interested m finding the maxima w* of the criterion function 

from which the unity norm solution(s) u can be computed as u* = w*/|lw*ll, 
with (|w*[| 7 ^ 0 Now by noting that 

^(x'^w)^^ = ((x'^w) (x'^w)) = w"^ (xx”^) w 
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and recalling that is the autocorrelation matrix C Equation 3 2 may be 

expressed as 


J(w) = 


w'^Cw 

IhF 


(3 3) 


The extreme points of J{w) are the solutions to AJ(w) = 0, which gives 


(3 4) 

The solutions to Equation 3 4 are w = ac*, t = 1, 2, , n, a € R In 

other words, the maxima w of J(w) must point m the same or opposite direction 
as one of the eigenvectors of C For the maximum exists at w = ac^ for some 
finite real valued a Therefore, the variance of the projection x'^u is maximized 
for u = Ui = w*/|lw II = ±c^ Next we repeat the preceding maximization 
of J (w) in Equation 3 3 but with the additional requirement that the vector w 
be orthogonal to Let it is w* = ac^ Thus U 2 = Similarly the solution 
Us = c® maximizes J under the constraint that Us be orthogonal to Uj and uj 
simultaneously Continuing this way, we arrive the solution at the m principal 
directions Ui through u^ The projections x'^u,, i = 1, 2, , m, are called 

the pnncjpa/ components of the data For reference of PC A see [6] 


. .. ^ Cw||w||2 _ (w'rcw)w „ 

AJ(w) = n 77^ = 0 


or Cw 


(w^Cw) 


w 


w 


W 


Result 4 features 


We can use PC A technique to do linear transformation from input space to re- 
duced dimensional space and also to reduce the dimensions by finding the number 
of redundant features It is useful in our problem because we want to know the 
minimum number of features required for the classification with good enough 
accuracy We also wanted to know the features which are really not affecting the 
classifiers performance 

For that first we found the eigenvalues in aU 6 principal components eigenvector 
directions using PC A with large data set We compared all 6 eigenvalues and 
found that out of 6 eigenvalues, 4 are comparable and two are very less compared 
to other 4 So, we decided that there are two features which are really redun 
dant 

We tested the classifiers with selecting one feature as a redundant and tested 
with only rest five features and the results were compared We also calculate the 
eigenvalues after removing one feature and compared with those before removing 
one feature If it is same then the removed feature is redundant otherwise not 
While doing testing with all features considering as a redundant but one at a time 
and while comparing the results it is found that duration is redundant Then the 
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similar test is done for searching another redundant feature It is found that rise 
time IS also second redundant It is because impulsive transient and oscillatory 
transients can be classified using affective values of peak value and dominant m 
terharmonics in case of oscillatory transients So the duration and rise time are 
really redundant here The rest four features e g peak voltage, RMS voltage 
TIID and Dominant Interharmonics are the optimum for these six disturbances 
set as shown as first 6 in Table 3 3 One can include no disturbance as the sev 
enth disturbance class because it doesn t need any extra feature except these four 
optimum features mentioned above The no disturbance is also included in the 
same table Our final generalisation testing of different classifiers will be done 
using these four features only for these seven disturbance classes 




4 Analysed Classifiers 


Befoie analysis of all classifiers, we explain here the basic structure of the classifier 
construction and functioning processes The simple structure for constructing the 
classifier m general is shown in Figure 4 1 Tratmng set is a set of data samples 
with the information about their belongingness which is used to design the classi- 
fier It IS also known as Knowledge dataset or Learning set During construction 
of the classifier data samples of the same class are used to create the functional 
parameters or discriminant set of that class It is shown as iVf, for class in 
figure mentioned above g defines the function to construct M, using training 
subset T, Figure 4 2 shows general structure for classifier operation to classify 
an unknown input vector x The Decision function f uses the discriminant set of 
every class from designed classifier to calculate the belongingness of input \ector 
X m a particular class in a specific form The class decider decides the output 
class from all class function values 



Ml 


M2 


Mk 




I 


9 


0 

1 

V3 

P 


Subset construction Functional 
of every class of the cl parameter 

Figuie 4 1 Basic structure of classifier construction 
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X 



Class 


Input Functional Decision 

vectoi parameter function 

Figure 4 2 Basic structure of classifiei operation 


4 1 Euclidean distance classifiers 

It IS one of the simplest pattern classifiers It is also called as nearest mean linear 
classifier oi minimum distance classifiers Each object is assigned to the class for 
which the mean is nearest (in the Euclidean sense) in feature space This classi 
fier implements a piece wise linear discrimination function between these means 
Flere, the decision boundaries are the points that are equally distant from two oi 
more of the class templates With an Euclidean distance method the decision 
boundary between region t and region j is the line or plane that is the perpen 
dicular bisectoi of the line from class templates fj, to Analytically these 
linear boundaries are a consequence of the fact that the discriminant functions 
aie linear 

For 2 dimensional feature space, it a straight line or a set of lines For 3 
dimensions, a plane or set of planes and for n dimensions n > 3 it is a hy- 
perplane or set of hyperplanes How well the classifier works depends upon how 
closely the input patterns to be classified resembles the templates Two dimen 
sional patterns with Gaussian distribution, classified by Euclidean distance clas 
sifier IS shown m figure 4 3 Here, the Xi and 033 are two features of the pattern 
vectors From the figure, one can observe that the Euclidean distance classifier 
misclassifies the data samples if they lie on the other side of the decision bound 
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Figure 4 3 Euclidean distance classifier with normally distributed patterns 


aiy fiom it’s ovin class side The performance of the classifier is dependent not 
only on the data distribution of individual class, but comparative distribution of 
the data of neighbour classes It is also affected by large variation m the size of 
regions of data points of neighbour classes There are some limitations to this 
this classifier 

The block diagram of Euclidean distance classifier as shown in Figure 4 4 ex 
plains about the design of classifier and classification of the unknown input vector 
Knowledge database is provided for design of the classifier Classifier calculates 
the mean vector for every class as a functional parameter This is enough for 
the requirements to classify the unknown data sample The decision function is 
Euclidean distance of the input vector to all classes’ mean vectors and the final 
class of belongingness will decided by final discriminating mtn function on Eu 
clidean distances from all the classes’ mean 

Template matching can easily be expressed mathematically Let x be the feature 
vectoi for the unknown input, and let be templates for k classes 

Then the Euclidean distance or the error in matching x against fjbq is given by 
Equation 4 1 A minimum error classifier computes dg for g = 1 , , k and 

chooses the class for which this error is minimum Since, dg is also the dis 
tance from x to ftg it is called as a minimum distance classifier The distance is 
Euclidean (lineai), so we call this a Euclidean distance classifier[2] The mathe 
matical representation of class assignment to input vector x is stated in Equation 
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EUCLIDEAN CLASSIFIER 



Figure 4 4 Euclidean minimum distance classifier block diagram 


42 



dq 

Where 

X 


I 

and 

fJ'g 

Where 

Nq 

and 

p: 


Six) 


^J{X- fXg)^[l]iX- flj) 






[®1) !B2) ) G R- > 

unity matrix of size (n X n) 

Na q 

mean^J^ p 

[mg 1 , mg 2, , rrig „] 6 R.” 

number of data points of in training set 
vector of g‘^ class from training set 
arg ||x - nigjl 


(4 1) 
(4 2) 


Because of the it’s basic principle is linear distance, it has certain limitations to 
the leal applications It is required to modify the data in input space before using 
for the classifier to have a good performance 


4 2 Bayes classifier 

The Bayes classifier is a mechanism which mimmises the classification error This 
one IS a probabilistic classifier It works on the basic principle of Caussian prob 
ability density functions (PDF) of the data points in all classes In figure 4 5, the 
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Bayes classifier follows C aussian probability functions is shown for two classes 
with only one feature The classification strategy here is labeling an object with 
the label foi which the Bayes probability is highest The data distributions of 
both the classes are assumed to be normal (Gaussian) So their PDFs will be 
same as shown in figure The hatched area represents the error this function 
S'(x) makes Theoretically the minimum eiror this classifier makes called the 
Bayes error 

This IS a statistical model with assumed PDFs Gaussian (normal) distribu 



Figure 4 5 Bayes classifier with Gaussian PDF function in one dimensional fea- 
ture space For reference see [12] 


tions are usually used for this purpose Block diagram of the classifier is shown 
in Figure 4 6 The parameters of each distribution mean vector and covariance 
matrix are estimated on the basis of the training samples of each class Next, the 
estimated PDFs of all classes are combined in order to classify unknown samples 
We used here Bayesian Quadratic classifier strategy Each PDF is calculated for 
an unknown sample, the the sample is classified to a class with a highest value 
Its classification boundary forms a quadratic curve (or a quadratic surface in a 
higher dimensional space) 
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BAYES CLASSIFIER 



Figure 4 6 Bayes classifier block diagram 

4 2 1 Bayes Decision Theory 

The Bayes classifier is based on the assumption that all of the relevant probability 
values are known The a prion probability P{w ) for i = 1, ,fc are assumed 
to be known If no information is available of a prion probability then one can 
consider unity for all classes Then it is called maximum hkelyhood classifier 
So a pi ion probability factor is ignored heie The random vaiiable x can be 
determined to what class it belongs to based on a decision rule of probabilities [9] 

Decide iiij for maa:P(L 0 ,|x) foi i = 1, 2, , fe (4 3) 

The a posteriori probabilities P{w |x) may be calculated from a prion probabil 
ities P{wi) and the conditional density functions p(x|tw ) using Bayes’ theorem, 
which IS 

^ pix\w)P{w,) 

= — ;(;;) — 

k 

where p(x) = ^^p(x\Wi)P{wx) (4 4) 

j=i 

The lepiesentation of the classifier is in terms of a set of discriminant functions 
g (x),z = 1, where c is the number of classes The classifier is said to 
assign a feature vector X to class Wi if 

fl't(x) > Qj (x) for all j i (4 5) 
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For minimum error rate classification the discriminant function can be expressed 
as 


9 (x) = P{w |x) (4 6) 

The effect of any decision rule is to divide the feature space into c decision regions 
iii , , Hfc If gr (x) > gfj (x) , for all j ^ then x m ilj and the decision 

rule calls for us to assign x to w The Bayes classifier assumes a probability 
distribution is known for the observation In this case the distribution chosen 
was the C aussian or normal distribution The general multivariate density for 
this distribution is expressed as 

^ 7- -- - ;4 7 "| r ea!p[-;^(x - - fz)] (4 7) 

(27r)2|K|3 2 

where x is a n component column vector ^ is a n component mean vector K is 
the n by n covariance matrix, K“^ is the inverse of K, and |K| is the determinant 
of K Also fx = JE?[x] and K = E[(x — fj.){x — ^)*] The covariance matrix K 
IS always symmetric and positive semidefimte The diagonal elements of K is the 
variance and off diagonal elements are the covariances For the Bayes probability 
of test vectors x belongs to t*'* class is considered as defined in Equation 4 7 
with the covariance matrix K is consider for class, i e Kt If no co-variance 
exist between different features then K = I and the Bayes classifiers works 
like Euclidean classifier The mathematical representation of class assignment to 
input vector x is as follows 

5(x) = arg max^_j^Pi(x) (4 8) 

In practical problem however an infinite number of learning objects is never 
available For that reason, the true probability density functions and a prioi 
probability mfoimation is not available In this case the error will probably be 
higher than Bayes error Classifier construction is then based on the assumption 
that the learning objects available represent the true probability density func 
tions It IS known as an ideal classifier because it has a highest generalisation 
performance over other classifiers The other classifiers aie compared to this 
classifiers to evaluate their performances The limitation of this classifier is data 
distribution of the know ledge data 

4 3 Euclidean k-nearest neighbour classifier 

Nearest neighbour algorithms are a famous, attractive and simple form of non 
parametric classifier Given a set of stored training samples (prototypes), an 
unknown sample Xo will be classified by computing a distance in input space 
from Xo to each of the training samples It is assigned to the class label SI which 
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IS a label of the nearest neighbour A neighbour is deemed nearest if it has the 
smallest distance in the Euclidean sense in feature space Euclidean distance is 
the simplest of all type of distances 

NN classifier is a special case of k NN classifier where k=l The k nearest proto 



Figure 4 7 Basic principle of k nearest neighbour classifier 


types will be selected to calculate the class of the unknown sample Xo by majority 
decision of the k nearest neighbouis In another words, k NNC selects those lo 
cates whose rank distances from input Xo are less than or equal to k Then the 
decision of input Xq will be more frequently occurring class among chosen k near 
est neighbours The value of k, number of neighbours to be considered is chosen 
by the user It is normally 1 5% of the average data points of one class in 

the input space The Basic principle of k nearest neighbour classifier is shown in 
figure 4 7 for k = 10 

Block diagram of k nearest neighbour classifier developed here is shown in Figure 
4 8 All knowledge data samples are used to calculate the Euclidean distance from 
the unknown input vector in k NNC as stated in block diagram To classify the 
vector, two classifiers works m parallel First is NNC which classifies the input 
vector by searching the nearest neighbour and labeling it s class belongingness to 
unknown vector as mentioned before The second one is k NNC, which provides 
the information about the quality of classification It finds the classes of majoiity 
belongingness as mentioned previously from k NN and calculates the number of 
neighbours of every class belongingness It provides the information about the 
location of input vector m the space and and it s quality by having idea about 
the percent belongingness to other classes 
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k NEAREST NEIGHBOUR CLASSIFIER 



Figure 4 8 k nearest neighbour classifier block diagram 


Mathematically, NN classifier can be represented as defined in Equation 4 9 and 
k NNC can be represented as defined m Equation 4 10 


S(,Xo) = 
where j = 

where Naii = 

n = 

where /» = 


S(x^) 

arg - x^H 

k 

=1 

S (Xo) = arg 

number of neighbours from category 


(4 9) 
(4 10) 


This discrimination function or decision surface by NN classifier will in general 
be jagged, piece wise linear function since it is influenced by each object available 
in the learning set It becomes smoother in k NNC because the decision is judged 
by k points rather than only one point as in NNC A disadvantage of this method 
IS its laige computing power and storage requirements since for classifying an 
object its distance to all the objects m the learning set has to be calculated 


4 4 Fuzzy classifier 

The fuzzy rule based classification approach is sufficiently transparent so that 
designer can understand the decision process and easily apply changes and/or 
modifications to it There are different ways to develop fuzzy rules and fuzzy 
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membership functions from training dataset Every class has one fuzzy rule and 
e\ er> fuzzy rule has one membership function from every feature Membership 
functions aie parameterised using mm max limits clustering of training samples 
of the same class and piojecting every cluster on all features For example 
cieation of e i i '"i i > for three classes in two 

dimensional \ \ ^ ns method only average 

of all data points and width of cluster over allTeatures is considered to create 
limit paiameters of membership functions on every feature For reference to this 
classifier please refer [10] 



Figuie 4 9 Creating equilateral triangular membership functions through per 
forming projections of min max limits onto the individual feature 
axes 


4 4 1 Assumptions for design of fuzzy classifier 

• The data distribution over all features is continuous 

• Distribution of the data points inside the cluster is symmetrical about the 
mid point of the region of cluster 

• Intersection of two different class clusters is empty 
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FUZZY CLASSIFIER 



Figure 4 10 Fuzzy pure classifier block diagram 


Block diagram of fuzzy classifier developed here is shown in Figure 4 10 The 
lequirement of the fuzzy classifier are mean values and minimum and maximum 
limits for every feature range of all classes It will generate the membership 
function for every feature of all classes with that information If the knowledge 
data set is provided then it will extract this infoimation from this data set It can 
calculate the membership function values for all feature components of the input 
vector X and finally the output class will be evaluated using max min Mamdani 
lule on membership value matrix of size (n X k) 

4 4 2 Creation of membership functions 

The patterns are vectors x = [aii, ,a:^] 6 R” and set of k classes is a crisp 
subset of R The pattern features are lepiesentecl by fuzzy sets and the chs 
sification IS described by a set of linguistic rules There are variable number 
of fuzzy sets for different features Let p,j be representation of feature of 
t*'* class pattern p, from training set p, be lepiesentation for a sample value 
of feature Let Pij be a membership function of class for feature 




Chapter 4 Analysed Classifiers 


Page 33 


Mathematically the steps of membership function generation are as follows 
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(4 11) 
(4 12) 
(4 13) 
(4 14) 
(4 15) 
(4 16) 
(4 17) 
(4 18) 

(4 19) 

(4 20) 


(4 21) 


4 4 3 Fuzzy classification strategy 

It IS a Mamdam fuzzy inference rule called Max mm fuzzy rule we implemented 
for fuzzy classification The max — mm concept used here for t — norms and 
t — CO ~ norms is stated below 

fuzzy AND p(x) = min[fii{xi), , Pn{^n)] 

fuzzyOR p(x) = maa![pi(®i), , PnC^n)] 

fuzzy NOT Pj(x) = 1 — (4 22) 

The Linguistic Form of fuzzy min rule is as follows 

IF »i = Ai AND ®2 = Bi THEN x=Ci (4 23) 

Tor example, Mamdam fuzzy inference rule for two classes and two features is 
shown m Figure 411 Here two features defined as X and Y and two fuzzy AND 
rules are there Ai, J3i and A 2 , B 2 are membership functions of features X, Y 
for rule 1 and rule 2 respectively As shown in figure, each rule uses min 
function for each output calculation Output is Z The max function can be 
applied for final defuzzified value of Z The defuzzification technique for output 
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mm 





1 Z 


Figure 4 ii 


Mamdani fuzzy model for two features and two classes 


fiom Mamdaui rule can be Center of area method as shown iii Equation 4 24 
Instead of doing Defuzzification using centre of area method, max rule is 
applied for final class identification of the class The max — mzn Mamdani rule 
which here applied is stated in Equation 4 25 

_ I;, {z)zdz 
Me 

S(x) = arg max^_^min^_^fjbij{xj) 

4 5 RCE neural network classifier 

Restricted Coulomb Energy (RCB) classifier works on the principle of hyper 
spherical classifiers having hyperspherical decision boundaries RCE classifier is 
a potential function governing mapping characteristics interpreted as a restricted 
form of a high dimensional Coulomb potential between a positive test charge and 
negative charges placed at various sites[7] The RCE net is capable of developing 
proper separating boundaries for nonlinearly separable problems Block diagram 
of RCE classifier developed here is shown m Figure 4 12 The previously trained 
RCE net is used for classification of unknown samples Every input vector will 


(4 24) 
(4 25) 
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be provided by output binary vector after analysis by the trained network The 
length of the output binary vector will be same as the number of classes The m 
dex of the binary vector represents the class So output class index value of this 
output vector is assigned logic 1 and all other class index values are assigned 
logic 0 The class decider detects the output class from this output vector 


RCE CLASSIFIER 



OUTPUT 

CLASS 


Figure 4 12 RCE classifier block diagram 


4 5 1 Basic principle of hypersphencal classifiers 

It stores the example patterns in Euclidean space like in nearest neighbour clas 
sifier and calculates the linear distance between the new point and the the known 
points in the input space Each stored point has a finite radius that defines its 
region of influence Interior of the hypersphere generated represents the decision 
region associatea with center point’s category The finite radii of the regions of 
influence can make a hypersphencal classifier abstain from classifying patterns 
fiom unknown categories Thus this later feature enhances the classifier’s ability 
to reject rubbish 

4 5 2 Classifier Development 

RCE classifier is a mapping from real to binary, defined as follows 


o 

x € R" =k {0, 1}* 

(4 26) 

O^Cx) 

r 1 if X belongs to class 

“ ^0 if X doesn’t belong to class 

(4 27) 

>S'qr(x) 

= Wg, tf{0)q = 1 

(4 28) 


The architecture of the RCE network contains two layers a hidden layer and 
an output layer The hidden layer is fully interconnected to all components of 
an input vector x € R” The output layer consists of L units The output 
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Input Hypersphencal OR Class 

Pattern units layer units layer vector 

Figure 4 13 RGB network aichitecture 


layer is sparsely connected to hidden layer, each hidden unit projects its output 
to one and only one output unit The architecture of RCE net is shown in Figure 
4 13 Each unit in the output layer corresponds to a pattern category The 
network assigns an input pattern to a category I if the output cell j/j is activated 
in response to the input The decision of the network is unambiguous if one and 
only one output unit is active upon the representation of the input, otherwise the 
decision is said to be ambiguous The transfer characteristics of the hidden 
unit IS given by 

x)] (4 29) 

where fj,j € R” is a parameter vector called center, Vj G R is a threshold or 
radius, and D is linear distance function between two vectors Here, f is the 
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threshold activation function given by 

J otewl 

On the other hand, the transfer function of a unit in the output layer is the 
logical OR function The hidden unit m the RCE net is associated with a 
hyperspherical region of the input space that defines the unit s region of influence 
The location of this region is defined by the center fij and its size is determined 
by the ladius According to equation (4 29) any input pattern falling within 
the influence region of a hidden unit will cause this unit file This hidden units 
define a collection of hyperspheres in the space of input patterns Some of these 
hyperspheres may overlap When a pattern falls within the region of influence of 
several hidden units, they will all fire and switch on the output units they are 
connected to 

4 5 3 Training of RCE net 

Training of RCE net involves two mechanisms unit commitment and modifica 
tion of hidden unit radii [7] unit commitment involves hidden layei units and 
output layer units 

Initially the network starts with no units An arbitrary sample pattern is 
selected from the training set, and one hidden unit and one output unit are al 
located The allocated hidden unit center /xi is set equal to and its radius 
Ti IS set equal to a user defined parameter Tmax (rmax Js the maximum size of 
the region of influence of a hidden unit) This unit is made fully interconnected 
to the input pattern and projects its output Zx to the allocated output unit (OR 
gate) This output unit represents the category of the input Next a second 
arbitrary example x^ is chosen and fed into the current network Here one of 
thiee scenarios emerges 

Fust if x^ causes output unit to fire, and x^ belongs to the category represented 
by this unit, then nothing is done and the training is continued with a new input 
If this same scenario occurs during training when network has multiple hidden 
and output units representing various categories if the only correct output unit 
fires then nothing is done But if correct unit may fire with one or more output 
units of other categories then the radii of the hidden active units representing 
other categories are reduced until they become inactne 

Second, x^ doesn’t cause to fire output unit even if it belongs to that category 
then a new hidden unit is allocated with center at /X 2 = X^ and radius r„ ax 
and Z 2 IS connected to the correct output unit In general radius of new center 
IS r = Tmn(r„ axi dmtn), where drum is the distance from this new center to 
the neaiest center of a hidden unit representing any other category This setting 
of ladius may cause one more output units to fiie along with correct category 
reduction of ladn of active hidden units of other categories is done like m the 
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first scenario 


FLOW CHART OF RCE NEURAL NETWORK TRAINING ALGORITHM 



Figure 4 14 


RCE net training flow chart 


Third and Final %nput belongs to new category that is not represented by the 
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netwoi k Here as m the first step of training procedure, a hidden unit centered 
at this input is allocated and its radius is set as in second scenario Also a 
new output unit representing the new category is added that receives an input 
fiom a newly allocated hidden unit Again if existing hidden units become active 
under this scenario then their radii aie shrunk until they become inactive The 
training phase continue until all set of examples are finished and no new units 
are allocated and the size of the regions of influence of all hidden units converges 
The flow chart for RCE net training is shown in figure 4 14 


4 6 Sequential Fuzzy Classifier 

The features and techniques of a special structure of fuzzy classifier that was 
developed for the purpose of designing a mix disturbance classification system 
are described here 

To give an overview one can identify the method as sequential set of fuzzy rules 
Number of features considered before reaching a decision may vary at diffeient 
paits of the classification tree The aim of this system is to judge the type of 
input unknown disturbance if it is a mixture of two or more known disturbances 
The classifier is designed m such a way that if the input vector is having charac 
teristics of two different classes then m that case, sequential classifier produces 
results of all possible classes of belongingness Here, the selections of features 
which allows user to define constraints of the classifier to classify different type 
of mixed disturbances Here the classification of mixed disturbances is feature 
dependent The feature for mix disturbances is selected along with the distur 
bances dependent on that feature on only So, every feature having mixing nature 
IS responsible to judge one disturbance if it is when the classification tree is ex 
ecuted during sequential classification So, if there is at least one disturbance 
except no disturbance depends on only one feature then it is possible to use this 
classification technique For reference to this classifier, please refer [4] and [5] 
The block diagram of Fuzzy Sequential classifier is shown in Figure 4 15 It’s 
requirements are almost same as fuzzy pure classifier as studied in Section 4 4 
It need additional infoimation about the features and their related disturbances 
for establishing the fuzzy mixed classifier We will discuss it in the next section 
in detail Outputs from both fuzzy pure classifier and fuzzy mixed classifier are 
used as a total output Normally fuzzy pure classifier is the first part of fuzzy se 
quential classifier and second part is fuzzy mixed classifier Here, we will discuss 
fuzzy mixed classifier in detail since fuzzy pure classifier we already studied 

4 6 1 Description 

The knowledge database consists of training set of disturbances including normal 
voltage with no disturbances The generation of fuzzy sets for all disturbances 
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CLASSIFIER 

OUTPUT 


Figure 4 15 Block diagram fuzzy sequential classifier 


on all features are done using training dataset as same as in normal fuzzy pure 
classifier mentioned before The equilateral triangle membership function could 
be used for all classes The test vector is first tested with fuzzy pure classifier 
and fuzzy sequential or mix classifier if it is not one of the pure disturbances from 
database The output of the classifier contains more than one disturbances if it 
IS mixed in nature 

The system uses two step tree structure In first step it decides whether there is 
pure disturbance or not If it is not a pure disturbance, it checks for all possi 
bilities of mixed disturbances in second step In each node of a tree structure, 
it consists of fuzzy rules and yields fault measures at its leaves The architec 
tuie of Sequential classifier with input disturbance vector having four features is 
shown in Figure 4 17 Here, features THD (Total Harmonic Distortion) and DIE 
(Dominant Interharmonics) are considered as having mixed disturbance gener 
ating nature Figure 4 16 shows a simple example of such a tree structure In 
sequential classifier, user has to provide the information about the features which 
are having mixing nature He has to give information about the distuibances 
othei than no disturbance based on that feature he has to provide information 
about the normal voltage with no disturbance Every feature has different ranges 
for different disturbances We have assumed that all feature must have one and 
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only one range which is for normal waveform It defines no disturbance 
Here f is membership value after fuzzy inference rule operation on the unknown 


Input vector x 



\ 

\ 

V 

V 
\ 



Figure 4 16 Classification tree structure for mixed disturbances in sequential 
classifier For reference, please refer in Appendix [5] 

input vector x As shown in the tree structure /i is always a membership value 
resulted by fuzzy pure classifier which includes all disturbances with all features 
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of the vector The fuzzy pure classifier uses Mamdani max — min inference 
lule for decision making class Fi is a pure disturbance if > 0 Otherwise it 
will proceed for the sequential classification of using information about the fea 
trues having mixed nature It finds the mixed disturbance based on each mixed 
feature individually using max inference rule on that particular feature The 
output of this step is a vector of membership values can be shown as /2 / 2 j 

IS a membership value after max rule operation on feature having mixing 
nature It also checks the possibilities of mam disturbances if there is any fea 
ture which doesn t have mixing nature Here, it uses all features which doesn’t 
have mixing property together like fuzzy pure classifier and min operation on 
every disturbance It results a vector of size equal to number of disturbances and 
consists of one membership values for every disturbance after min operation In 
the figure of tree structure the vector is shown by f^z The value of vector for 
distuibance say if positive then it may be the main disturbance which 
could be mixed with the other disturbance found previously Normally, only one 
disturbance out of all has positive membership value in the last case If there 
are more than one disturbance in this mam disturbance detection case then it is 
not perfect detection In that case user may reselect the features having mixing 
nature by providing more features having mixing nature 

The pure disturbance classification is same as max — min principle used pre 
viously in fuzzy pure classifier Fuzzy mix classifier uses max rule for detecting 
mixed disturbance from features having mixing nature, for j** feature it is stated 
as follows 


S^(x) = 
Where Fzk = 
and fzk = 
Where j = 
Where c^ta: = 
and cd(j) = 

where dij {%) = 


du (cdij)) / V 

argmax 

S^(x) 

maa5j_^/z,j(a5j) 

indGx(^k'^^ — 1 , , a; 

number of features having mixing nature 

number of disturbances accompanied with 

feature including no disturbance 

disturbance index of disturbance accompanied 

with feature of mixing nature (4 31) 


Foi detecting the main disturbances out of features which aie not having mixing 
nature, the formula is stated in Equation 4 32 Flow chart of fuzzy sequential 
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classifier during testing stage is shown in Figure A 6 

^ t/,x(x)>0 

Where A* (x) = 

= 1 if feature has mixing nature, 

+ 1 ) - 

•^2( m +1) = ^ 32) 

As shown in Figuie 4 17 fuzzy sequential classifier is designed to classify mixed 
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Figure 4 17 Fuzzy sequential classifier architecture 


disturbances sequentially if it is not a pure disturbance Here, features THD 
(Total Harmonic Distortion) and DIH (Dominant Inter harmonics) are selected 
as having mixing nature So, they are classified individually using max function 
So, possible output of this step will be only one disturbance for every feature If 
the disturbance content of that feature is normal it classifies it as no disturbance 
For the featuie THD, the disturbance resulted may be Low Harmonics or High 
Hai monies if it is not a normal value of that feature Similarly for DIH, the 
result may be Inter harmonics if it is not normal value If the value is unhnown 
for that feature’s fuzzy membership functions then the result will be unknown 
disturbance After that, the classification on the basis of features which don’t have 
mixing nature is proceeded Here, the classification is on the basis of may be more 
than one feature but it doesn’t include features having mixing nature So some 
disturbances which depends on some features which don’t have mixing nature 
and features which have mixing nature both, i e Oscillatory Transients is having 
low harmonics content which is a dependent on feature THD and THD is selected 
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as a feature having mixing nature and impulsive transient which only depends 
on the features which don’t have mixing nature here So, when classifying on 
the basis of features which don’t have mixing nature there is a possibility that 
more than one disturbances resulted The final disturbance is considered as one 
mam disturbance mixed with the disturbances resulted from individual features 
having mixing nature 


5 Classifiers Testing and Results 


5 1 Test Procedure 

After having idea about the generation of PQ disturbance data in Chapter 3, 
we have to test the classifier to evaluate their performances for PQ classification 
problem For that we should test the classifiers with different data distributions 
over feature ranges as the real world data distributions are not uniquely defined 
In the following points, we mention the tested data distributions and generalised 
test procedure 

5 11 Testing with different data distributions 

We have tested the classifier with following three types of possible data distribu 
tions 

a) Uniform distribution Data is distributed over all feature ranges uniformly 

Mathematically generation of feature of a vector for class is 
stated as follows 

dataset xl = rmn(x^) 

-f- {max{xl) — mtn(xl)) * rand([0, 1] j N) (5 1) 

Where N is number of data points to generate and ron<i[0,l] represents 
the uniformly distributed randomly data generator function from 0 to 1 

b) Normal distribution It is also known as Gaussian distribution Generation 

of the data of this distribution is same as of uniform distribution except 
that the data generator is normal distribution function 

c) Abnormal distribution The data is generated by randomly dividing the fea 

ture range with random subpart of the total data to generate in each sub 
part of the feature range Each suhpart of the feature range having normal 
distribution Here the data distribution shapes of different featuies for the 
same and different type of classes are not same but random 
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5 12 Generalisation of testing procedure 

1 he coinmon isrocedme adapted for testing of all classifiers is described as follows 

Training set and test set 

The data set is sphted into two parts befoie used foi classifiei evaluation The 
first part is used to design or construct the classifiei called the Training Set 
Learning Set or Knowledge Data set and the second part is used to evaluate the 
peifoimance of the classifier called the Test Set The dataset is divided into two 
paits sequentially after generating it landoinl;;^ As a general way the first 60% 
data IS used as training set and rest is used as a test set It s better to have a 
large dataset for classifiei performance evaluation Cenerally we used to have 
i dat iset minimum of iOOO data vcctois foi every class If we use the training 
set for classification then it s called reclassification The reclassification would be 
only for the pioof of mathematical functionality of the classifier 
Training process of the classifier is shown in Figure 5 1 Training set has a set of 
pans of data vector and it’s class of belongingness As shown in a figuie men 
tioned above, classifier is trained to classify the unknown test vector from using 
the knowledge of the class type of data vectors in training set In Figure 5 2, 
testing procedure of the classifier is shown In this stage classifier is assumed to 
have a capacity to classify the data samples from training set Here, number of 
test vectors of every class which axe correctly classified by it’s own class misclas 
sified by some other classes or becomes unknown are calculated from using the 
knowledge about the class belongingness of every data vector of test set Gen 
eralisation procedure of different classifiers are shown in figures attached in the 
appendix in the form of flow chaits The results of the classifier is presented in 
form of error matrix which is explained m the next section 
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Figure 5 1 Classifier Training Process 
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Figure 5 2 Classifier Testing Process 


Error matrix calculation 


Error matrix iT is a result matiix generated on the basis of classifier perfor 
malice on test data It is a square matrix and the size equal to number of classes 
{k X k) H,, € [0, 1] indicates the value in per unit of number of classification 
of 2 *^* class test vectors resulted in class It can be represented in % also 
Cenerally, diagonal values are much higher than off diagonal values of the ma- 
trix as it represents the correct classifications If, suppose for z*''' row, any 
off diagonal value is higher than diagonal value of the matrix in that row or the 
diagonal value is comparatively lower than other diagonal values then classifier is 
said to be poor to classify the vectors of i*'* class and misclassifying the vectors 
of that class in much more amount The graphical representation of error matrix 
is called Histogram chart 

The accuracy of the classifier can be calculated on the basis of number of mis 
classified vectors out of tested vectors in every class If the vector is classified as 
unknown or classified in wrong class then it is said as misclassified vector The 
accuracy of the classifier is defined as follows 


/number of correctly classified vectors\ 

accuracy = ; — — ; — ; I 100 70 

V number of tested vectors / 


(5 2) 


Development of test vectors 

We have considered here six disturbances having six features for general testing 
of all classifiers first The lower limit and upper limit for every feature of every 
distuibance is shown m Table 5 1 The total ranges of all features are normalised 
and individual features’ ranges of all disturbances are calculated linearly on that 
basis Duration range is original in milliseconds and rise time range is normalised 
to 20 The data generation for this disturbance-feature set can be done in 1 per 
unit or 100 % normalised scale As they are related linearly, just linear multipli 
cation factor converts scaling from per unit to linear and vise versa Normally, 
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Note 

Vm 

Peak voltage 

Vrms 

Voltage magnitude 

Te 

Duration of Event 

ip 

Rise time 

THD 

Total Harmonic Distortion 

DIH 

Dominant Inter harmonics 

Table 5 

i Normalised Feature ranges of six 

disturbances with six features by [0 


1 pu] This ranges are used for different experiments on classifiers 
to decide the the optimal feature ranges and distributions of the data 
points Here, duration feature is shown m ms and rise time is shown 
in a range of [0 20] This data range structure is used for testing 

different classifiers 

duration and and rise time are redistributed with logarithmic function after gen 
erating so then ranges would become comparatively symmetrical to other feature 
ranges The reason for this will be mentioned in the next section The distn 
bution histogram of all six individual features of all disturbances for uniform 
normal and abnormal distributions are shown in three different figures attached 
in the appendix 

5 2 Test of Individual classifiers 

5 2 1 Euclidean Minimum Distance Classifier 

This IS the first classifier we implemented for PQ classification problem Because 
of it s simple structure for analysis, we did different tests for the data generated 
from Table 5 i We found some basic structure of data generation on the basis 
of these tests which described as follows 
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Visualisation and analysis of error matrix 

The data set generated from feature ranges as shown in Table 5 1 is used for test 
ing the Euclidean classifier Here, duration and rise time values are redistributed 
using login function to provide the uniformity of feature ranges for different 
classes The results of generalisation is shown in Figure 5 3 It can be observed 
that the error matrix for first class belongingness as shown first from left in the 
figure above has good accuracy 

Here three error matrices are shown m the figure Second and third from left 
show second and third class of belongingness Every row shows results of that 
row class samples The height of every bar indicates the fraction of total test 
samples from the row class classified by the class indicated by the class index 
below it Second and third belongingness results provides the additional infor 
mation about the classes surrounded by the mam class 

The lesults without using logarithmic functions are shown by a figure attached 
in the appendix It is not having good accuracy due to uneven width of duration 
and rise time for different disturbances It is also found that linear transforma 
tion of the ranges will not affect the results of Euclidean distance classifier The 
trials were done to do nonlinear transformation of feature ranges to improve the 
results The results for some feature ranges changed for uniformity of feature 
langes for different disturbances are attached in the appendix It is found that 
results of Euclidean distance classifier is affected by the nonlinear transform of 
feature ranges of different classes The shifting of boundaries will result into 
change in second and third class belongingness can be observed in the same fig 
ures with nonlinear transformation or mapping of some features 
The test results with abnormally distributed data is shown in a Figure in the 
appendix It is found that the performance of the classifier becomes less accurate 
if the data is abnormally distributed over the feature ranges 

Pros and cons of the classifier 

From the testing results above, we can describe the pros of Euclidean classifier 
as follows 

• It has simple design structure to understand 

• Easy to develop the classifier 

• It has less memory requirement 

• It has less testing time 

The cons of the classifier are listed as follows 

• It has poor accuracy 
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Figure 5 3 Euclidean distance classifier eiror matrix with total range is [0 i] and 
nse time and duration are redistributed with log^o Data samples for 
every class is 1000 out of which 60% is used for training and 40% is for 
testing The originally data of all features are distributed uniformly 
The results are improved very well compared to that in Figure A 7 
So, it IS must here to redistribute the data of duration and rise time 
features by logarithmically to improve the results The second and 
third columns provides the results in form of error matrices for second 
nearest and third nearest classes as mentioned in Figure A 7 


• It depends on only one vector called the mean vector for every class, so if 
the feature ranges of particular disturbance is uneven then it is not desirable 
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for good results 

• Hypeiplane decision surface location depends on the distance between neigh 
bouring disturbance classes mean vectors Due to nonuniform distribution 
of mean vectors of different classes in the input space decision surface of a 
particular class is not equally distant in all directions from it s mean vector 
This may lead to misclassification of some of pattern vectors of that class 

• Due to uneven volumes of training vector clouds of diffeicnt distuibaaces 
it may possible that some test vectors of the class having large volume may 
fall onto the other side of the decision boundary This will result into the 
misclassification of test \ectors 

• As it works on the minimum distance and it just comparative measure So 
If the input test vector is unknown and doesn t belongs to any of knowl 
edge classes then also it will misclassify by resulting it to any one of the 
knowledge classes 

• The class representative is only one vector So, it gives unexpected results 
during testing if the data distribution for every class is not symmetrical 
around the mean vector 

5 2 2 Bayes Classifier 

As we have seen some tests on Euclidean distance classifier to improve the data 
distributions, now onwards we will consider the data of 6 disturbances with 6 
features having normalisation by [0 1 pu] range on linear scale only as shown in 
Table 5 i and duration and rise time are redistributed with log^p function The 
data range structure is shown m Table 5 1 We will test the classifiers with all 
thiee types of data distributions The results after testing the Bayes classifier are 
discussed as follows 

Visualisation and analysis of error matrix 

Figuie 5 4 shows the error matrix for the data mentioned above with uniform 
distribution of iOOO data samples of every class The representation of error 
matrix is same as in Euclidean classifier for first class of belongingness We 
can observe that the error matrix histogram is almost diagonal means that all 
the class test samples aie almost classified in their own class The number of 
vectors misclassified aie very less here The accuracy of classification is above 
99 % foi all classes So the results are better than any results with Euclidean 
distance classifier as seen in Section 5 21 Here, the results are based on the 
Bayes probability value for different classes for the particular input sample So 
there is probability value which follows Gaussian distribution principle for the 
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Figure 5 4 Bayes classifier error matrix with data number data distribution 
feature range and individual class feature ranges are same as in Figure 
53 


cluster of a particular class So if the sample far from the cluster then it s Bayes 
probability of belongingness in that class would be very less If the sample belongs 
to that particular class cluster then the Bayes probability of belongingness in that 
class IS comparatively much higher than other classes So, there is no possibility 
for finding the second class of belongingness in case of Bayes classifier unlike 
Euclidean classifier as we have seen 

We have tested this classifier with all classes having a pnon probabilities same 
(maximum likelyhood) If it is considered then the only the multiplication factor 
to the Bayes probability function will change for different classes and the variance 
and the mean values will not be affected So, if two classes are overlapping 
then this principle of a prion probability will work nicely The sum of a prion 
probabilities of all classes is always unity 

Pros and cons of the classifier 

The pros of Bayes classifier can be stated as follows 

• The result are having very good accuracy It can be used as a reference 
measure of accuracy for any other type of classifier since it provides almost 
ideal results 

• It works on the principle of a prion probability It is useful for our case 
because, in real power system different events are not having the same 
frequency of occurrence e g some survey in different parts of the worlds 
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says that Voltage Sag is having the highest frequency of occurrence So, the 
a priori piobability can be calculated fox every class on the basis of their 
frequency of occurrence and can improve the results 

• It IS fast enough for testing large number of data 

• It calculates the Bayes probability using covariance matrix, so if the variance 
of different feature data for the same class is not same then also it doesn t 
affect the lesults unlike m Euclidean distance classifier 

• Ihe Bayes probability value for ever} class depends only upon the distribu 
tion of it s own class data samples So, the orientation of other neighbouring 
class data samples doesn t affect the results unlike in Euclidean distance 
classifiei 

• Sizes of the different class clusters don t affect the results here 

• Bayes probability value is very less if the sample is outside the class cluster 
So it IS possible to limit the misclassification of unknown class sample by 
any known classes if the sample doesn’t belong to any one of the known 
classes 

• The results improve if the data samples are large enough 
The cons of Bayes classifier can be stated as follows 

• The Bayes probability works on the basic principle of Gaussian distribu 
tion If the data samples of any class are not distributed according to 
Gaussian distribution, the Bayes probability function may become having 
wrong mean value and covariance matrix which may lead to raisclassifica 
tion This case is dangerous when the data samples of different classes are 
just touching and the distribution over the feature ranges are uneven or 
abnormal The real power system events data is not Gaussian distribution 
for all features and even it varies from class to class for the same feature 
The real system data results’ accuracy may decrease from this results For 
example, one abnormal case is shown m Figure A 1 So, results becomes 
less accurate if the data is distributed towards the corners of some features 

• It doesn’t work well when the classes are overlapping 

• It has smooth boundaries around the mean vector, so for data at the end 
limits of features may he on the corners may misclassify if the another class 
data samples adjoining the border are present 
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5 2 3 k-NN Classifier 

We have tested k NN classifier with the data generated from the data ranges 
shown in Table 5 i We have tested with k=i and k=10 nearest neighbours for 
iOOO data samples in dataset for every class and compared the results The results 
are discussed in the next section This classifier has some additional advantages 
compaied to other classifiers in PQ classification system are also discussed It 
IS tested with different type of data distributions discussed in Section 5 11 and 
lesults are compared 

Visualisation and analysis of error matrix 

The lesults for uniformly distributed data and normalisation of feature data 
langes by [0 Ipu] as shown in Table 5 1 and taking log^o to duration and 
use time features are shown m Figure 5 5 The first and second columns from 
left shows Dnor matrices for results of 1 NN and 10 NN classifieis respectively 
It can be observed that results of 1 NN classifier is slightly better than that of 
10 NN classifier Both results are better than Euclidean distance classifier re- 
sults foi the same data as shown Figure 5 3 As mentioned m Section 4 3k NN 
classifiei provides piecewise linear classification, it improves the nonlinearity in 
the decision boundaries So, it is having more capabilities to classify if the data 
samples of neighboring classes aie very near to each other If the number of 
nearest neighbours is decreased (value of fe), the nonlinearity of decision surface 
will mciease 

Wc can also see iii Figure 5 5 that thud and fourth from left represents the er 
lor matrices foi second nearest class and third nearest class for 10 NN classifier 
It can be calculated fiom the selected k neighbours foi every test sample So 
these mfoimation is not possible in case of 1 NN classifiei because it has only 
one neighbour for class information If the test sample is inside the cluster of the 
class data samples then class data samples then almost all k neighbours will have 
a same class belongingness but if the test sample is on the boundaries of the class 
cluster then it may possible that it will have k neighbours with different class be 
longingness In that case some neighbours out of k neighbours from neighbouring 
class clusters also present In that case one can have an information about the 
second belongingness of the test data sample So, it provides the probabilities 
of belongingness of the test data sample in different classes It is called the in 
formation about classification quality No other classifier provides this type of 
information So, this is the unique feature of k NN classifier From figure, one 
can observe the histogram chart of second class and third class belongingness for 
test samples of every class in per unit values Number of test samples for every 
class IS the base values for all graphs For first row results of Impulsive transients 
test samples, there is no information in third and fourth columns, it is because 
there is no neighbour from any other class except the one class in which the test 
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Figure 6 5 1< NN classifier error matrices with data distribution, feature range 
and individual class feature ranges are same as in Figure 5 3 100 
data samples for every class out of which 60% is used for training 
and 40% is used for testing The structure of four error matrices is 
same as in Figure All The results for k=l and k=10 NN are almost 
same It is slightly better than Euclidean distance classifier as seen 
in Figure 5 3 It gives infoimation about class clusters orientation as 
discussed in Figure A 11 
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sample belongs out of k neighbours For oscillatory transients, the number of 
test samples is very less which has second or third class of belongingness 
The results of k NN classifier tested with abnormal data is shown in Figure All 
Heie it IS observed that when the data is abnormal then the results with 1 NN 
and 10 NN becomes less accurate the possibilities of misclassification m neigh 
bourmg class will increase when the class cluster is having variation of density of 
tiamed data points throughout the cluster So it would be better to have uniform 
knowledge data points distribution for the every class cluster to get good results 
of k NN classifier As shown in figure, particularly voltage sag and harmonics are 
misclassified in neighbouimg class of voltage sag and each other The addition 
information of classification quality as mentioned m last paragraph is useful here 

Pros and cons of the classifier 

The pros of k NN classifiei can be stated as follows 

• It has better accuracy than Euclidean distance classifier 

• The classifier structure is very simple to understand 

• It provides the additional information about the classification quality 

• The effectiveness of the classifier can be tuned by changing number of near 
est neighbours (value of k) 

• It IS capable to classify the class clusters which are overlapping or one inside 
the another class clusters or clusters m the form of sandwich 

• The results are not affected by the shape or size of the cluster 

• Tiaining of the classifier is very simple and improvement of the knowledge 
database to modify the performance of the classifier is simple 

The cons of k NN classifier can be stated as follows 

• Accuiacy is not good compared to Bayes classifier 

• It lequiies large storage for the knowledge data points 

• It IS very slow due to calculation of the Euclidean distance with each and 
every knowledge data point from test point 

• The results are affected by the uneven distribution density of different neigh 
bourmg class clusters’ knowledge data points 

• If the distribution of the data points in a particular class is abnormal then 
misclassification may increase 
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5 2 4 Fuzzy Classifier 

As shown m Figure 5 6 the error matrix of fuzzy classifier for uniformly dis- 
tiibuted data is almost diagonally predominant There is rarely any misclassifi 
cation found by fuzzy classifier The results are very good out of all classifiers 
tested here The results may become bad if there is abnormal data distribution 
The membership function is used here is Equilateral triangle for every feature of 
all classes The mean value is calculated from the mean of the data points and 
the base of the tiiangle is calculated on the basis of maximum width on any side 
of the mean value of the feature range The result doesn t change if the mem 
bership function changed by trapezoidal hyperbolic parabolic circular or some 
other shapes with the limits remains the same It is also tested with Caussian 
membership functions the results are almost same with the other results of fuzzy 
classifier It can be recognized that it has almost same functioning as Bayes clas 
sifiei Here we have some flexibility for membership function generation unlike 
in Bayes classifier 

As we have discussed during feature selection stage in Section 3 41 that the 
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Figure 5 6 Fuzzy classifier error matrix with data number, data distribution 
feature range and individual class feature ranges are same as in Figure 
53 

features are selected such that their values are identity of the particular class 
So, it IS good to apply Fuzzy classification strategy developed here based on the 
membership function for every feature of all disturbance classes So this strategy 
can be used to detect mix disturbances by step by step classification based on 
individual feature values of the unknown or test sample This will be evaluated 
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m section 5 2 6 

Pros and cons of the classifier 

The pros of Fuzzy classifier can be stated as follows 

• Accuracy is very good compared to all other classifiers 

• Simple to understand and modification for knowledge improvement is easy 

• Flexibility to select different type of membership functions 

• Testing is fast enough 

• Fertuie by featuie classification m steps cm be developed easily So it can 
be used foi sequential classification 

• less storage lequired 

• Accoiding to power system distuibance data generation and ranges cie 
ation, fuzzy classifier strategy is good enough for classification here 

• It classifies the test sample which doesn t belongs to any class as unknown, 
class instead of misclassifymg by some other class 

• Only the minimum and maximum limits of every feature of all disturbances 
are enough to construct the classifier 

The cons of Fuzzy classifier can be stated as follows 

• Results becomes not good if the classes are overlapping or one surrounds 
the other 

• More than one membership function for any feature of the class is not 
allowed according to classifier strategy we used here 

• If the rmn and mao; limits are not true enough then the possibilities of 
misclassification increases This may be one case of abnormal data distri 
bution It works on mean value and limits of the feature data points, so, 
abnormality of the data points is not desired Data symmetry on both sides 
of the mean value is desired 


5 2 5 RCE Classifier 

The results of testing RCE classifier is shown in Figure 5 7 The results are almost 
good accurate compared to Euclidean distance classifier and k NN classifier 



Chaptei 5 Classifiers Testing and Results 


Page 59 


Error M Iri 



DlslurtMnM l«M 01, uftaoe* cla«t 


Figuie 5 7 


RCE classifier error matrix with data number, data distribution, fea 
ture range and individual class feature ranges are same as m Figure 


53 


Pros and cons of the classifier 

The pros of RCE classifier can be stated as follows 

• It has good accuracy 

• It IS easy to tram the RCE classifier at any stage 

• Less stoiage required 

• Fast testing 

• It can classify the test sample which doesn’t belongs to any class as unknown 
class instead of misclassifying by some other class 

• It doesn’t affected by the uneven size of different class clusters 

• It can classify the classes separated by nonlinear decision boundaries 

• More training improves results 

• Data distribution doesn’t affect the results 
The cons of RCE classifier can be stated as follows 


• Training is slow 
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• It fails when the clusters overlap 

• It is difficult to get high accuracy like Bayes classifier 

5 2 6 Fuzzy Sequential Classifier 

Fuzzy sequential classifier has been tested for the mixed disturbances data The 
data generation of mixed disturbances is same as for the data of pure distur 
bances The change here is only the ranges of the different class test vectors 
accoidmg to the type of mixed disturbances The knowledge dataset will have 
a samples of pure disturbances and the test dataset will have the samples with 
mixed disturbances The number of output classes are predefined and includes 
all pure disturbances plus possible mixed disturbances 

Visualisation and analysis of error matrix 

As the results shown in Figure 5 8, it has good accuracy for pure as well as 
mixed disturbances classification It can classify 8 pure disturbances and 12 
mixed disturbances Here only 3 disturbances results are shown for example It 
IS accurate for any of the disturbance testing out of 24 different classes created 
from 8 basic classes The features used here are only four Two features duration 
and rise time are not selected here Here only fuzzy classification technique is used 
for sequential classification Some other classifiers could also be implemented for 
sequential classification 

Pros and cons of the classifier 

The pros of Sequential Fuzzy classifier can be stated as follows 

• It can piovide the information about the mixed classes and subclasses 

• More detailed classification is possible 

• fast enough compared to other classifiers 

Tire cons of Sequential Fuzzy classifier can be stated as follows 

• Stiucture is complex 

• The disturbances could be mixed is decided by a particular feature So, 
there must be some feature which can detect the disturbance alone without 
using the information from other features 

• Limited number of mixed disturbances can be classified 




0 5 10 15 20 25 
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Piguic 5 8 bcqucnlnl fuzzy classifier ciroi matrix with eight knowledge classes 
and il tested classes using four features It shows that fuzzy se 
quontial classifier classifies mixed as well as pure disturbances with 
good accui acy The pure distuibances in row sequence are stated as 
follows i Imp Tran , 2 Osc tran , 3 V Sag, 4 V swell, 5 Low 
Harmonic, G Iligli Harmonic, 7 Interhairaonic 8 No Disturbance, 
9 Imp IVaii I I ow haimonic 10 V Sag + Interharmonic il V 
Swell H High Ilairaomc 24 columns of erior matrix defines 8 pure 
distuibances and 12 mixed disturbances 

5 3 Total Results 

So fai the classifiers have been tested individually with different data due to 
the state of the development progress Now, all classifiers are trained and tested 
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under equal conditions The same data set of 7 disturbance classes and 4 optimum 
featuies is used for evaluation of generalisation capacities of all different classifiers 
The results aie shown in Table 5 2 Heie, we have not shown the results of 
Sequential cHssifiei it is because it functions as same as fuzzy classifier for pure 
disturbances It differs only when the test data has mixed disturbance samples 
We can yield to the decision on the basis of these results and pros and cons of every 
classifier discussed in the last section that the Bayes the k Nearest Neighbour 
uid the luzzy classifieis together aie best for our implement atiou woik In the 
next chaptei we will try to show User Interface Program for implementation of 
PQ disturbances classification with some examples 


Disturbance 

type 

Euclidean 

classifier 

Bayes 

classifier 

Fuzzy 

classifier 

Euclidean 
k Nearest 
Neighbour c 

RCE neural 
network c 
classifier 

no disturbance 

100% 

100% 

100% 

100% 

100% 

impulsive tran 

78 800% 

100% 

100% 




78 800% 

100% 

100% 

100% 

99 875% 


99 575% 

100% 

100% 

100% 

98 5% 

voltage swell 

100% 

100% 

100% 


96 875% 

hai monies 

81 9% 

100% 

100% 

■IHHI 

87 775% 

inter haiinomcs 

100% 

100% 

100% 


99 975% 


Table 5 2 Results of Generalisation test of 4000 uniformly distributed test pat 
terns for each disturbance, after training with 6000 sets for each type 
of disturbance Every class has four optimum selected features 












































6 Realised Classifier 


In this chapter, the realised classifier system is presented The system is im 
plemented in Matlab[ll] There are various GUI (Graphical User Interface) for 
modification and result assessment of the classifier 


6 1 Basic Structure of parallel classifier 

The simple structure of parallel classifier design using GUI system is shown in 
Figuie 6 1 It works on the principle of paredlel classification of unknown input 
vector using the Bayes, the fuzzy and the k NN classifiers The user can generate 
T, new database or use a previously generated database He can select the test 
vector or create a new one The results assessment from classification system to 
the user is done The three results of three different classifiers are assessed to 
the user The voting of all classifiers is disclosed for the unknown input \ector 
class identification There are two possibilities of results First, the results of 
ill three classifiers are agree then nothing to do If the result of any classifier is 
disagree with other classifiers then the flexibility to train the classifiers is provided 
for that unknown vector The user can add a vector to the database when he 
doesn t satisfy with the results and wants to improve the database We developed 
GUI system for implementation of fuzzy sequential classification system for mixed 
disturb rnccs is well as pure disturbances classification The main steps of GUI 
system developed in Mutlab are explained as following 


6 2 Database Creation 

The control window for Database creation window m matlab platform is shown 
in figure 6 2 Here, first user has to input number of classes and number of 
features, applying OK there will provide the new window as shown in above 
figure to fill up the different parameters required to generate the new database 
The figure shows the filled up window for 8 disturbances and 4 featmes User 
has to provide number of knowledge data for every class under the filled name 
for every disturbance as shown in figure He has flexibility to choose the type of 
distribution for every feature e g normal, uniform etc and type of redistribution 
whether to use logarithmic scale or linear scale for every feature under the filled 
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Figure 6 1 Basic structure of realised parallel classifier 


up feature names The he has to fill up minimum and maximum limits for all 
features of every disturbance as shown in the figure The scale repiesents a linear 
scaling factoi from real data to the classifier input data After completing this 
process, user has to click on the create data button will ask for the file name 
for this new created database It will generate the data vectors according to the 
mfoimation filled up and store the whole structure to the new data file m Matlab 


6 3 Testing and Result Assessment 

The Matlab window for testing the parallel classifier is shown m Figure 6 3 First 
of all user has to select the data base before testing He has to click on From 
database button to select one of the created data bases Now he can generate the 
unknown vectoi by clicking on Generate button and filling the vector values Or 
he can select one vector from the data base foi testing purpose He can choose 
the disturbance class of the test vector and the type of data base i e test data 
base or knowledge data base After that, he has to click on choose button to select 
one new test vector of specified class and data base Here, later the test vector 
of recorded signal will be inserted automatically online and classified Once the 
input vectoi is leady in the vector box, he has to just click on the Classify button 
to assess the results of the parallel classifier system 
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Figure 6 2 User Interface Program artificial database generation window 


As shown in the figure, one input vector of swell disturbance is chosen from 
test data base The results shows that fuzzy, Bayes 1 NN and k NN classifier 
have the same result and all classifies the unknown input vector correctly The 
k NN classifier provides an extra information about the neighbouring class or the 






























































Figure 6 3 User Interface Program parallel testing of the unknown input vector 


quality of the input vector Here it denotes that 99 % of the neighbours are from 
the snme class of the input vector but only one vector is from no distui bance 
class It says that the input vector is having small swell which can be observed 
Iiom the feature values of the vector displayed The second feature is RMS value 
in the selected database Here, results of all classifiers agree as swell So the 
disturbance is decided as swell If the results of all classifiers are disagree then 
the results are said to be wrong In that case, he has to see the waveform for 
the disturbance and decide the type of disturbance originally it was and modify 
the database to tram all the classifiers for that vector So, if the same type of 
disturbance comes in future, hopefully it will be classified correctly by all the 

cicissificrs 

If user wishes to add the tested vector to the database then he has to click 
on button Add vector to database, it will be asked whether he wants to add m 
particular class or not User can select the class m which he wants to add the 
new vector by selecting from pull down menu shown in figure Here it is no 
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disturbance class is displayed in pull down menu 

6 4 Sequential classifier implementation 

The Matlab window for Mixed Classifier is shown in Figure 6 4 

As in parallel classifier here user has to first select a data base by clicking a 



Figure 6 4 Usei Interface Program sequential classifier implementation 
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database Now he has to establish a relation between features 
He has to select the feature which has mixing nature in the 
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left column and the disturbances other than no disturbance which can occur due 
only that feature e g dih (Dominant Interharmomc) can create disturbance 
iniei harmonic mixed distuibance The selection is shown m figure And like 
wise tlul (Total Harmonic Distortion) feature has been selected with mharmonic 
(moie harmonics) and Iharmonic (less harmonics) disturbances Here selection 
of the test vector is same as in parallel classifier discussed Or he can create the 
input vector by own by filling up in the rector box shown After that he has to 
click on the Classify to assess the results of fuzzy sequential classifier He will be 
asl ed to select the no disturbance out of all classes After clicking OK he will 
get the results of the classifiei by more than one disturbance Theie will be one 
itsulted class from every feature which has mixing nature and one or more from 
features which don t have mixing nature The information from every feature 
which has mixing nature identifies which type of mixed disturbance due to that 
Duticuhi featuie if it is Otherwise it says there is no disturbance mixed due 
to that feature by identifying it as a no disturbance The result from rest of the 
featuies which don’t have mixing nature, will identify the mam disturbance 


7 Conclusion 


The focus of this project was to design a basir ^ r 

. ^ j , ^ Classifier system for power qual- 

1 y (PQ) events We introduced the general pattern classification system and 
the punciples of classification in Chapter 2 Here, the strategy for the design of 
pattern classification systems was presented In Chapter 3 this procedure was 
adopted to PQ Possible classes and features of PQ events were analysed Out of 
them 6 signal paiameters and 6 basic PQ disturbances were choosen for classifica 
tion We found out that 4 of the 6 features nearly consist of the same information 
by applying piincipal component analysis The basic classification strategy of 5 
different classifiers plus one with sequential structure are discussed in Chapter 
4 In Chapter 5, we did the generalisation performance evaluation of 5 classifiers 
by changing feature ranges and data distributions to improve the classification 
performance and found that 3 classifiers out of 5 are suitable for PQ classifica 
tion application The realised classifier using a parallel structure of the classifiers 
with the best performance, includes the Bayes, the fuzzy and the k NN classifier 
using various graphical user interfaces (GUI) is developed in Matlab We also 
developed a GUI for Puzzy Sequential classifier for classification of superimposed 
disturbances The realised system is scalable that means additional disturbance 
types and different featuies can be added or changed Also the system provides 
information about the quality of the classifier’s decision When unknown PQ 
events occure the system has the ability to expand its database That is helpful 
to let the system identify a similar event correctly when it appears next time 
Puture development aspects include the design of an interface for a present data 
acquisition personal computer (DAQ PC) card to measure real data in power 
systems Here the amount of disturbances to classify v\as set to 6 different types 
so tins set can be extended to more disturbance types Also the assessment should 
be completed with statistical evaluation of the classified events In addition there 


are possible improvements for the classifier itself, e g the fuzzy sequential classi 
filer for adapting more flexibility to classify various kinds of mixed disturbances 



A Additional Figures 



Figme A i Bayes classifier error matrix with feature range and individual class 
feature ranges are same as in Figure 5 3 but the data distribution is 
as shown in Figure A 14 We can observe that the Bayes classifier 
may give bad results when the data is very irregular Here we can 
observe m Figure A 14 that the data distribution of Interharmomcs 
cl rss for Dominant Interharmonic feature is distributed by towards 
the limits and no data at the center So, it is misclassifled by the 
Oscillatory transients by about 40 % of the test data This is a case 
which IS not an usual case in real power system 
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1 iguie A 2 Flow chart of testing Euclidean minimum distance classifier 
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Piguie A 3 Plow chart of testing Bayes classifier 
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Fignie A 5 Flow chart of testing Fuzzy pure classifier 
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Figure A 6 Fuzzy sequential classifier flow chart 
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Figuie A 7 Euclidean classifier error matrix when duration and rise time are not 
redistributed using logarithmic function 
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1 iguie A 8 Euclidean distance classifier erior matrix with data is same as in 
Figure 5 3 except that the THD feature ranges for every disturbance 
IS nonlinearly mapped to feature space The try is to equalise the 
lange of different classes for the same feature Here the results are 
improved compared to Figure 5 3 for the class represents Harmonics 
Here, re sectioning of TIID feature reduces it s misclassification with 
neighbouring classes like voltage sag, voltage swell etc So, we can 
judge that the uniformity of the feature ranges of different classes for 
a particular feature is good to improve the classifier performance 
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Figuie A 9 Euclidean distance classifier error matrix with data is same as in 
Eigure 5 3 except that peak voltage, RMS voltage and THD features 
lie nonlinearly sectioned The sectioning aim is stated in Figure 
A 8 If we use re sectioning procedure for peak voltage and RMS 
voltage feature also then the results for the second and third class 
belongingness for the harmonics class improves Due to re sectioning, 
the distance from Harmonics mean vector to voltage sag and voltage 
swell mean vectors becomes equal As a results the second class and 
thud class belongingness for harmonics for voltage sag and voltage 
swell becomes equal due to uniform distribution of data samples 
This case results differ with that in Figure A 8 Same results can be 
observed for Interharmonics class data samples results 
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Figiue A 10 Euclidean distance classifier error matrix with data number for ev 
ery class, feature range and individual class feature ranges are same 
as in Figuie 5 3 but the data is abnormally distributed as shown 
in Figure A 14 The results becomes less accurate due to shifting 
of mean vectors of the class from it s cluster center compared to 
results of uniformly distributed data as shown m Figure 5 3 
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Piguie A ii k NN classifier erior matrix with data distribution is same as m 
Figuie A 10 The data samples foi every class is 100 out of whicli 
60 IS foi training and 40 for testing The first and second columns 
lepiesent the error matrices for k=:l and k=l0 nearest neighbour 
classifiers respectively The third and forth columns represent sec 
one! nearest and third nearest class error matrices respectively fox 
k=10 NN classifier The results aie better than the results of Eu 
clideaii distance classifier for the same type of data as seen m Figure 
A 10 It is because the nonlineaiity of the decision surface increases 
with decrease in neaiest neighbour So, it functions as a piecewise 
linear classifier which can adapt to the abnormal distribution of the 
data m a better w ay compared to Euclidean distance classifier The 
results become less accurate compared to those with uniform distri 
bution shown m Figure 5 5 due to decentralisation of data samples 
m every class cluster itself The randomness of test samples differs 
than that with training dataset in abnormally distributed data Re 
suits of 1 =1 and k=10 NN classifiers are almost same as seen here 
The more information about neighbouimg classes can be available 
by this classifiers from third and fourth column results We can 
have a nice idea about the orientation of data vectors of different 
classes from this classifier results 
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Figiue A 12 Data distributed uniformally for 6 disturbances and 6 features and 
100 data points for every disturbance The 100 data samples for 
every disturbance is shown Features peak voltage, RMS voltage, 
THD and Dominant Interharmonics are nornnalised to 1 per unit 
Duration and rise time features are generated first with, 0 100 and 
0 20 scales respectively and then log^o function is applied to redis 
tribute them So, they are not seen distributed as others though 
they are originally having same distributions Their scaling after 
redistribution is comparatively same as other features 
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I' iguic A 13 Data is same as m Figuie A 12 except that it is a Normal distiibu- 

tion 
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r igiup A 14 Data is same as in Figure A 12 except that it is an abnormal dis 
tiibution with 1000 data samples for every class It is different lor 
different class featuie ranges It is distributed randomly by section 
ing the individual class feature lange and data samples randomly 
and distributing those number of data samples m respective sub 
feature ranges As we can’t assure about the perfect distribution of 
the leal system disturbance data distiibution for every feature, this 
may be one of the possible distribution for testing the classifiers 


















Notation 


{ 0 , 1 }" 

[a,b] 

c-^ 

|C| 

l|x|| 

0 


n dimensional binary hypercube 

closed interval of leal valued numbers between a and b 

transpose of column vector a 

inner product of vector a and b 

inverse of matrix C 

Determinant of matrix C 

Euclidean distance between mean and vector x 
Euclidean norm of vector x 
mithematical expectation or mean 


Abbreviations 

Dill Dominant Interharmonics 

cm Graphical User Interface 

In natuial logaiithm 

logio base 10 logaiithm 

max maximum 

mtn minimum 

NNC nearest neighboui classifier 

NN neaiest neighbour 

OR OR Gate 

PQ power quality 

RCr lestiicted Coulomb energy 

RMS loot mean square 

niD lotal Ilaimonic Distortion 
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Symbols 


AND 

C 


c 

exp(a) 


f 

fm 

fv 



meanJ^i(P'’) 

N 


n 


N, 

NOT 

OR 


P(i/x) 

R 


5(x) 

Ta 

t, 

Vm 

Vrms 

X,Xo 

A/m 

A. 

a 

w 

0-2 

K 

€ 


logical and operation 
autocorrelation matrix 
eigenvector of matrix C 
exponential function 
frequency 

maximum frequency 
normal frequency 
error matrix 
identity matiix 
ciiteiion objective function 
number of classes 

mean vector of Nj vectors of class 

number of data samples 

number of features 

numbci of diti bimplcs m class 

logical not operation 

logical or operation 

conditional probability of unit i winning upon the presentation of x 
apace of n dimensional real valued numbeis also used to designate 
Euclidean space 

classifier mapping function fiom feature space to decision space 

Duration of Event 

Rise time 

Peak voltage 

Voltage magnitude 

input vector 

feature component value 
small change in frequency 
eigcn value of matrix C 
mean vector of class 
fuzzy membership function 
membership value of asj for class 
output class set 
output class 
Vaiiance 

CO variance matrix 
symbol for belongs to 
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